Covid-19 vaccines in Italian public opinion: Identifying key issues using Twitter and Natural Language Processing

The COVID-19 pandemic has changed society and people’s lives. The vaccination campaign started December 27th 2020 in Italy, together with most countries in the European Union. Social media platforms can offer relevant information about how citizens have experienced and perceived the availability of vaccines and the start of the vaccination campaign. This study aims to use machine learning methods to extract sentiments and topics relating to COVID-19 vaccination from Twitter. Between February and May 2021, we collected over 71,000 tweets containing vaccines-related keywords from Italian Twitter users. To get the dominant sentiment throughout the Italian population, spatial and temporal sentiment analysis was performed using VADER, highlighting sentiment fluctuations strongly influenced by news of vaccines’ side effects. Additionally, we investigated the opinions of Italians with respect to different vaccine brands. As a result, ‘Oxford-AstraZeneca’ vaccine was the least appreciated among people. The application of the Dynamic Latent Dirichlet Allocation (DLDA) model revealed three fundamental topics, which remained stable over time: vaccination plan info, usefulness of vaccinating and concerns about vaccines (risks, side effects and safety). To the best of our current knowledge, this one the first study on Twitter to identify opinions about COVID-19 vaccination in Italy and their progression over the first months of the vaccination campaign. Our results can help policymakers and research communities track public attitudes towards COVID-19 vaccines and help them make decisions to promote the vaccination campaign.


Introduction
SARS-CoV-2 infections are estimated to be already over 394 million worldwide, counting at least 5,7 million COVID-19 related deaths as of the February 1st 2022 [1]. Mortality aside, the disease has shown to be an unprecedented burden to both economies and healthcare systems in Italy [2]. The vaccination campaigns represented a new hope to put an end to the pandemic. The Italian vaccination campaign started December 27 th 2020. Vaccines were firstly inoculated to healthcare workers and nursing home guests. Starting February 2021, the entire population was progressively involved following medical and social priorities [3].
In a coordinated effort with the EU Commission, the Italian government purchased-gradually and depending on availability-different brands of vaccines: Pfizer-BioNTech, Moderna, Oxford-AstraZeneca and Johnson&Johnson.
Despite all of the above were authorised both medically and politically by the European and Italian institutions, a number of people was not comfortable with the idea of being vaccinated and a minority of them refused the administration of the serum. This unleashed a harsh debate which inevitably landed on social media. Thanks to almost 3.8 billion users across the globe [4], social media platforms are a precious data source and researchers have often profitably analysed comments extracted from e.g. Facebook [5], Twitter [6,7], Instagram [8], etc. regarding political, business and healthcare issues.
This paper applies Natural Language Processing (NLP) techniques to explore major topics and sentiments of tweets about COVID-19 vaccines among Twitter users in Italy.
In this paper we aim to answer three main questions related to the first 4 months of the vaccination campaign in Italy: • What has been the dominant sentiment towards COVID-19 vaccines? We responded through a sentiment analysis on vaccine-related Italian tweets, also detailing the sentiment in the various Italian regions. We used VADER as a sentiment analysis tool [9].
• Which brands of COVID-19 vaccine have been most talked about in Italy? Do people prefer any brands to others? In this regard, we explored the sentiment of Italian Twitter users towards different COVID-19 vaccine brands. We extracted vaccine brand hashtags from tweets and used VADER to analyse people's preferences with respect to different brands.
• What were the main topics of discussion regarding COVID-19 vaccines? Was there any change in public opinion during the months of the vaccination campaign? With respect to this problem, we have used the Dynamic version of the Latent Dirichlet Allocation model (DLDA) [10], exploring the most popular themes and their evolution over time.

The data
The aim of this paper is to investigate public opinion and perception on COVID-19 vaccines in Italy using tweets containing both #covid and #vaccine hashtags over the period 1 st February to 31 st May 2021. Raw data was provided by 'TrackMyHashtag' [11] and a total of 73,688 tweets were collected. Each gathered tweet was written in Italian and, in addition to text content, included information about the date on which the tweet was created, the Twitter screen name of the user, the device used to post it, the number of re-tweets ('reposted tweets'), the URLs of the Tweet, the follower number, and the likes number. Approximately 18,000 of the tweets collected also contained users' tweeting location.
The Twitter dataset was processed and analysed with Python 3 following these steps: 1. Raw data pre-processing. Natural Language Processing (NLP) techniques were used to pre-process raw tweets, with the aim of cleaning the text and removing irrelevant information 2. Sentiment Analysis. With the aim to identify and extract attitudes, opinions, evaluations and emotions within the gathered tweets, VADER (Valence Aware Dictionary and sEntiment Reasoner) [9] was used to run Sentiment Analysis of tweets.
3. Topic Extraction. After proper pre-processing, Dynamic Latent Dirichlet Allocation (LDA) was applied to detect the main topics in the collection of tweets and their evolution over months.

Data pre-processing
A Tweet is a microblog message posted on Twitter. The text content of a Tweet has a limit to 280 characters, which is reduced to 257 characters if a link is included. Raw Tweets are highly unstructured and embed redundant information.
With the aim of converting raw data into an easily readable format that is to be used in Sentiment and Topic analyses, we applied the following pre-processing steps using NLP techniques with the aid of Python: 3. Removed HTML tags (such as < div >, < p >, etc.) 4. Removed unnecessary line breaks 5. Removed special characters and punctuation except for exclamation points (the exclamation point is the only punctuation marks to which the used VADER lexicon is sensitive) 6. Removed words that are numbers 7. converted the text of the Italian tweets into English using 'googletrans' tool [12].
In the second part, a higher quality dataset was required for the topic model. The duplicate tweets were removed, and only the unique tweets remained. Apart from the general data cleaning methods, tokenization and lemmatization could enable the model to achieve better performance.
We tokenized the text using Gensim library [13] and converted all the text content to lowercase to ensure every word appears in a consistent format. Then we pruned the vocabulary, removing stop words and removing terms unrelated to the topic and we made a bigrams model. Finally, the spaCy library of NLTK [14] was used to accomplish lemmatization.

Sentiment analysis
Humans associate words, phrases, and sentences with emotion, and Sentiment analysis uses computational algorithms to extract and measure the emotion expressed within a text. In the present study, the Valence Aware Dictionary and sEntiment Reasoner (VADER) model was used to decode the sentiments, opinions, evaluations and emotions regarding COVID-19 vaccines within collected Tweets.
Introduced in 2014 by Hutto and Gilbert [9] and designed with a focus on social media texts, VADER is a pre-trained gold standard sentiment lexicon and rule-based sentiment analysis tool that works exceedingly well in microblog-like contexts, where by the short-text data is a complex mix of a variety of features and a sentence may contain multiple sentiments all at once. VADER Sentiment Analysis is free available as a vaderSentiment module incorporated into the Python NLTK package and can be applied directly to unlabelled text data. The reasons for using VADER in our study are manifold and lie in its many advantages over traditional methods of Sentiment Analysis, including: • It does not require any training data. It is constructed from a crowd-validated gold standard sentiment lexicon (along with the associated sentiment intensity measures), which is specifically attuned to sentiment in microblog-like contexts like Twitter • It has been shown to achieve excellent classification accuracy scores in assessing the sentiment of Tweets, outperforming individual human raters and performing as well as (and in most cases, better than) other highly regarded sentiment analysis tools [9] • It is computationally fast and does not suffer overmuch from a speed-performance trade-off.
Based on its complete rules, VADER can carry out sentiment analysis on various lexical features: punctuation, capitalization, degree modifiers, the contrastative conjunction 'but', negation flipping tri-gram.
The sentiment score of a sentence is calculated by summing up the lexicon rates of each VADER-dictionary-listed word in the sentence. After applying a proper normalization, VADER returns a 'compound' sentiment score (S s ) in the range −1 to 1, from most negative to most positive. Once the score S s is known, threshold values can be used to categorise tweets as either positive, negative, or neutral.
The VADER model returns the sentiment score and allows us, through the appropriate thresholds, to classify each tweet by its polarity: negative, positive or neutral (see Table 1).

Topic modeling
Topic Modelling is an unsupervised machine learning technique that aims to identify topics that best describe the contents of a set of documents. It is a text mining procedure with which the themes of documents can be identified from a large collected document corpus [15]. The Latent Dirichlet Allocation (LDA) model is one of the most popular topic modelling methods. It is a probabilistic model for expressing a corpus based on a three-level hierarchical Bayesian model. The basic idea of the LDA is that each document has a topic, and a topic can be defined as a word distribution [16]. Particularly in LDA models, the generation of documents within a corpus follows the following process: 1. A mixture of k topics, θ, is sampled from a Dirichlet prior, which is parameterized by α; 2. A topic z n is sampled from the multinomial distribution, p(θ j α) that is the document topic distribution which models p(z n = i j θ); 3. Fixed the number of topics k = 1. . ., K, the distribution of words for k topics is denoted by ϕ, which is also a multinomial distribution whose hyper-parameter β follows the Dirichlet distribution; 4. Given the topic z n , a word, w n , is then sampled via the multinomial distribution p(w j z n ; β). Overall, the probability of a document (or tweet, in our case) "w" containing words can be described as: Finally, the probability of the corpus of M documents D = {w 1 , . . ., w M } can be expressed as the product of the marginal probabilities of each single document D m , as shown in (2).
In our analysis. which includes four months of tweets, we find that the content of tweets is changeable over time, and thus that the topic content is not a static corpus. The Dynamic Latent Dirichlet Allocation (DLDA) is adopted and used on topics aggregated in time epochs, and a state-space model handles transitions of the topics from one epoch to another. A gaussian probabilistic model to obtain the posterior probabilities on the evolving topics along the timeline is added as an additional dimension. Fig 1 shows a graphical representation of the Dynamic Topic Model (DTM) [10]. As a part of the probabilistic topic models class, the dynamic one can catch how the various themes of tweets evolved. The tweets dataset corpus used here (February-May, 2021) contains 4 months. The dynamic topic model is accordingly applied to four-time steps corresponding to the four months of the data set. These time-slices are put into the model provided by gensims [13].
An essential challenge in DLDA (as LDA) is to determine an appropriate number of topics. Roder et al. [17] proposed coherence scores to evaluate the quality of each topic model. Particularly, topic coherence is the measure to evaluate the coherence between topics inferred by a model. As coherence measures, we have used C v that is a measure based on a sliding window that uses normalized point-wise mutual information (NPMI) and cosine similarity.
These values aim to emulate the relative score that a human is likely to assign to a topic and indicate how much the topic words 'make sense'. These scores infer the cohesiveness between 'top' words within a given topic. Besides, the distribution on the primer component analysis (PCA) is considered, which can visualize the topic models in a word spatial with two dimensions. A uniform distribution is preferred, which is considered a high degree of independence for each topic. The judgment for a good model is a higher coherence and an average distribution on the primer analysis displayed by the pyLDAvis [18].

Exploring the COVID-19 vaccine tweets
The word frequency of the most frequent 40 words terms are counted and visualized in Fig 2. The only word of concern quoted is 'deaths' due to some cases of deaths that the media had attributed to some batches of 'Oxford-AstraZeneca' vaccines, but the most words references to the number of doses available and to the progress of the vaccination campaign.
The 'location' features detected in approximately 18,000 correctly localised tweets highlights the number of tweets in the 20 Italian regions. The regions with the highest number of tweets are Lazio and Lombardy, that are also the most populated (Fig 3a). Fig 3b shows the 'tweet rate' of each region, that is the number of tweets per 1,000 residents: Lazio is the region with the highest rate, closely followed by Lombardy and Liguria.

Sentiment analysis
The output of the VADER model shows a moderate predominance of negative public opinion ( Table 2). As an example, three tweets with their own polarity are shown in the Table 3.
In Italy, the most talked about COVID-19 vaccines brands were Oxford-AstraZeneca (66.1%) and Pfizer-BioNTech (14.89%) see Table 4.  Even after the positive verdict of the European Medicines Agency (EMA) eight days after the stop, the climate of uncertainty [19] continued to impact on the vaccination campaign,  leading to the phenomenon of vaccine hesitation. This meant reluctance, delay and indecision about being vaccinated out of fear of having adverse reaction to the shot, eventually leading to increased distrust about vaccines and pharmaceutical regulation institutions.
As it is possible to see in the timeline of sentiments about vaccine brands (Fig 5), the suspension of AstraZeneca administrations brought negative opinions on all the other brands  except for Sputnik-V. The latter brand has never been available in Italy, despite its approval in other 18 extra-EU countries [20]. Moreover, just in February 2021, the Lancet journal published the results of a study on the efficacy of the Sputnik-V vaccine [21]: the study showed that the jab was 91.6 per cent effective at preventing infection, a much higher value compared to that of Oxford-AstraZeneca (70.4 per cent), published in the same journal two months earlier [22]. Even on severe side effects, Sputnik outperfomed AstraZeneca: 0.30 per cent against 0.69 per cent. Therefore, a possible reading of the sentiment analysis results is that Italians preferred Sputnik over AstraZeneca for its greater effectiveness and the lower percentage of severe side effects.
AstraZeneca presents the highest negative average sentiment score, with a small standard error from the mean, while Sputnik-V shows the highest positive average score, with a greater standard error than AstraZeneca (Fig 6).
The analysis carried out at the regional level was performed only on 18,000 tweets that had a regional geolocation. Fig 7, shows the average sentiment scores of the Italian regions: the sentiment score is neutral (between -0.05 and +0.05, see Table 1) for all regions. Indeed, there are no major differences in vaccination campaign in Italy from region to region. Furthermore, the result is consistent with the flattening due to the use of the average of the scores.

The topic model
To explore what the user is concerned about on Twitter with reference to COVID-19 vaccines, we applied the LDA model to our clean corpus, taking only the following tagger components: nouns, adjectives, verbs and adverbs.
To determine the appropriate number of topics, C v [17] was used as a measure of topic coherence, that is the coherence between topics inferred by the model (see par. 1.4). Besides, the distribution on the primer component analysis (PCA) is considered, which can visualize the topic models in a word spatial with two dimensions. A uniform distribution is preferred, which is considered a high degree of independence for each topic. The judgment for a good model is a higher coherence and an average distribution on the primer analysis displayed by the pyLDAvis [18].
By using topic numbers k ranging from 2-30, we initialised the LDA models and calculated the model's coherence. According to Fig 8, the coherence score peaked at 3, 5 and 11 topic numbers. The choice of 5 or 11 topic numbers would lead to a nonuniform distribution on primer component analysis (PCA), which means that there is not a high degree of independence for each topic (see LDAvis interactive map in S1-S3 Files). Therefore, we chose 3 as the topic number: the model has no intersections among topics, summarizes the whole word space well, and the topics remain relatively independent [23].
The LDA analysis is shown in Table 5. The first theme takes up to 45% of total tokens and includes tokens such as 'people', 'get', 'vaccinate'. Based on this, we inferred that most people showed interested in the vaccination campaign and wanted to be vaccinated. The second theme takes up to 30% of the total tokens, including the words 'day', 'vaccine', 'dose' and 'administration' showing people's interest in vaccination plan. Apart from that, several words, like 'effect', 'health', 'suspension' and 'patent', are mentioned in the third topic. This indicates concerns about vaccination measures and side effects. See Fig 9 for a word cloud representation of the three topics and the S1 File for their interactive map.
The LDA model output identified the following three topics:  The pie charts in Fig 10 shows the dynamic volume of each topic in the four periods. It is worth noting that the topics remained quite stable over the months except for topic 2 (usefulness of vaccinating) that grew slightly in the second month and then decreased in the next months. Fig 11 plots the probabilities of some notable terms (the most probable words in the topics predicted by the DTM) that change direction over the period. During the first four months of the vaccination campaign, attributions referring to vaccines such as 'good' and 'need'

PLOS ONE
Covid-19 vaccines in Italian public opinion decreased their probability in the model. That decrease happened at the same time of the suspension of AstraZeneca brand throughout Italy [24] in order to make checks in relation to the deaths of two men [25]. A few days later, AIFA revoked the ban and vaccinations resumed. This could explain the slight rise in use of the term 'good' occurring between April and May. However, after these events, people began to have doubts, not only about the effectiveness of vaccines, but also about the side effect risks of vaccination and the necessity to be vaccinated or not. In our opinion, this could be the reason why the probability of the term 'risk' progressively increases over the months, while, on the other hand, the probability of the term 'need', which expresses people's need to be vaccinated, slowly decreased (Fig 11).

Limitations
This study has some limitations. First of all, we only focused on Italian tweet contents. However, users may be distributed among different social media platforms and different countries according to their usage, language, and preferences. Anyway, the methods used in our study can be extended to different social media platforms. It is worth noting that the most recent statistics about social media usage show that approximately 83% of Twitter users worldwide were under age 50 [26]; this implies that Twitter-based studies generally suffer from an underestimation bias in the opinions of people aged 50 and over.
Additionally, Padilla et al. [27] found geographical bias in their analysis of Twitter data and found an overrepresentation of urban areas in the demographic data of Twitter users included in their study. Given this prior research, we must assume that users from urban areas are overrepresented in this data set as well.
The VADER model used for sentiment analysis uses a lexical approach. That means it uses words or vocabularies that have been assigned predetermined scores as positive or negative. The scores are based on a pre-trained model labeled as such by human reviewers. Therefore, there are also some disadvantages to this approach: • Misspellings and grammatical mistakes may cause the analysis to overlook important words or usage.
• Sarcasm and irony may be misinterpreted.
• Discriminating jargon, nomenclature, memes, or turns of phrase may not be recognized.
Regarding topic analysis, considering unsupervised learning such as LDA, the primary limitation is some degree of subjectivity in defining the topic created [28].
Duplicated Tweets Bots posting on Twitter are a well-documented phenomenon [29,30]. One of the issues our study faced was the duplication of content due to bot activity on the topic of vaccines. Other research has documented bot activity on COVID-19 and COVID-19 vaccine misinformation as well [29,31]. The main issue this may cause in our analysis is that bot activity may overinflate the importance of certain topics. To combat this, we removed retweets and duplicate tweets in topic analysis: the number of tweets from initial 73 thousand reduced to approximately 26 thousand tweets in our topic analysis.

Discussion and conclusions
The current COVID-19 pandemic has resulted in a surge of social media use as a forum for discussing an array of topics about the pandemic, including vaccines. However, social media users can be exposed to negative sentiments and misinformation [32], which may influence individual views and lead to vaccine hesitancy or refusal [33].
Currently, little is known about public opinion regarding COVID-19 vaccines in Italy. In particular, identifying vaccine hesitancy opinions is of great importance as the refusal to take anti-Covid vaccinations is concerning from a public health point of view.
To the best of our knowledge, this one the first study on Twitter to identify opinions about COVID-19 vaccination in Italy and their progression over the first months of the vaccination campaign. Moreover, better understanding public opinion by categorizing of the Twitter content is useful to identify how the vaccine hesitancy phenomenon was born in Italy. Our results can help policymakers and research communities tracking public attitude towards COVID-19 vaccines and helping them make decisions to promote the vaccination campaign.
In our study most sentiments towards COVID-19 vaccines were negative (36.1%) or positive (35.1%) and only 28.8% neutral ( Table 2). The answers to our three main questions related to the vaccination campaign in Italy are as follows: • The overall sentiment towards COVID-19 vaccines showed a slight predominance of people with negative attitudes. At the same time, the analysis conducted in the Italian regions showed an average neutral sentiment score in all the areas. In our opinion, these data indicates uniformity of the vaccination campaigns on the different Italian areas.
• In Italy, the most talked about COVID-19 vaccines brands were Astrazeneca and Pfizer-BioNTech, which concerned respectively the 66.13% and the 15.06% of total tweets. Among vaccine brands, Italian people preferred Sputnik-V, even though they had not tried it, and disliked AstraZeneca.
• Three topics were discovered to be most popular in vaccines tweets. These themes remained quite stable over time and respectively represent: (1) Vaccination plan info, (2) Usefulness of vaccinating and (3) Concerns about vaccines. Over the course of the months, people began to have doubts not only about the goodness of vaccines, but with respect to the risks of vaccination and the necessity to be vaccinated.
In conclusion, our analysis highlighted how fluctuations in opinions of COVID-19 vaccines and the evolution of topics have been affected by news regarding vaccine side effects. Also, the opinions towards the different brands of vaccines have been determined in the same way by the news regarding the effectiveness and possible severe side effects.
Our outcomes of sentiment categories and the topics identified from Twitter are in line to that of some studies, always referred to the first months of starting of the vaccination campaign in other countries.
In the US, a study about the vaccines on social media, confirm our results [34]. Through the trend analysis, it was found that the peaks of the topics were impacted by the events reported in the news and spread through social media. The sentiment analysis showed that 46.9% of the tweets were negative, 33.2% of tweets were positive and 19.9% of tweets were neutral. At the same time the topic analysis found that the administration and access to vaccines were some of the major concerns. A study conducted by Fazel et al. [35] in the United Kingdom confirmed a predominance of tweets with negative vaccine content that varied according to major news announcement. On the contrary, a research on COVID-19 vaccines that focused on tweets in English all over the world, revealed that the dominant sentiments were positive and neutral [36], but the main topics were always vaccine information and knowledge, vaccine hesitancy and severe side effects of the vaccines.
Future studies could investigate how perceptions and opinions about Covid-vaccines will change in the coming months and years, using sources other than Twitter and combining results of different European countries. Furthermore, our model can be extended to other research problems such as identifying misinformation on social media or to train a topic model with LDA to forecast event topics and trends.